On-line outlier detection and data cleaning

نویسندگان

  • Hancong Liu
  • Sirish Shah
  • Wei Jiang
چکیده

Outliers are observations that do not follow the statistical distribution of the bulk of the data, and consequently may lead to erroneous results with respect to statistical analysis. Many conventional outlier detection tools are based on the assumption that the data is identically and independently distributed. In this paper, an outlier-resistant data filter-cleaner is proposed. The proposed data filter-cleaner includes an on-line outlier-resistant estimate of the process model and combines it with a modified Kalman filter to detect and “clean” outliers. The advantage over existing methods is that the proposed method has the following features: (a) a priori knowledge of the process model is not required; (b) it is applicable to autocorrelated data; (c) it can be implemented on-line; and (d) it tries to only clean (i.e., detects and replaces) outliers and preserves all other information in the data. © 2004 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient RFID Data Cleaning Method Based on Wavelet Density Estimation

A large number of noise are usually carried in the original RFID data and need to be cleaned up before further processing. Outlier detection is an effective method for RFID data cleaning. In this paper, a point probability data model was proposed to describe the uncertain RFID data streams. The wavelet density threshold was incorporated in this method to adaptively detect the outliers in the sl...

متن کامل

Outlier Cleaning and Sensor Data Aggregation Using Modified Z-score Technique

The outlier detection is an important preprocessing routine that is required to ensure robustness of the sensory data analysis. Several factors make the wireless sensor networks (WSNs) especially prone to outliers. A sensor network is equipped with thousands of inexpensive, low fidelity nodes, which can easily generate sensing errors. Since these networks include a large number of sensors, the ...

متن کامل

Knowledge discovery in rubber extrusion processes

This paper describes the outcomes of a study that the EDMANS(**) group has recently performed in a rubber extrusion process, focusing on the knowledge discovery phase previous to the system modeling. Some of the tools developed to satisfy the special needs of such a process are also presented: the CiTree algorithm for clustering subpopulations in massive databases and the PAELLA algorithm for o...

متن کامل

Outlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means

One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...

متن کامل

Outlier Detection in Dynamic Systems with Multiple Operating Points and Application to Improve Industrial Flare Monitoring

In chemical industries, process operations are usually comprised of several discrete operating regions with distributions that drift over time. These complexities complicate outlier detection in the presence of intrinsic process dynamics. In this article, we consider the problem of detecting univariate outliers in dynamic systems with multiple operating points. A novel method combining the time...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computers & Chemical Engineering

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2004